arXiv:1504.02306v2 [cs.DS] 15 Feb 2016 


Optimal induced universal graphs and adjacency labeling for trees 


Stephen Alstrupf Spren Dahlgaardj and Mathias Baek Tejs Knudsen*^ 

University of Copenhagen, 

{s. alstrup,soerend,knudsen}-@di .ku.dk 


Abstract 

We show that there exists a graph G with 0{n) nodes, such that any forest of n nodes 
is a node-induced subgraph of G. Furthermore, for constant arboricity k, the result implies 
the existence of a graph with 0{n^) nodes that contains all n-node graphs of arboricity k as 
node-induced subgraphs, matching a O(n^) lower bound. The lower bound and previously best 
upper bounds were presented in Alstrup and Rauhe [FOCS’02]. Our upper bounds are obtained 
through a log 2 n -I- 0(1) labeling scheme for adjacency queries in forests. 

We hereby solve an open problem being raised repeatedly over decades, e.g. in Kannan, 
Naor, Rudich [STOC’88], Chung [J. of Graph Theory’90], Fraigniaud and Korman [SODA’IO]. 
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1 Introduction 


An adjacency labeling scheme for a given family of graphs assigns labels to the vertices of each graph 
from the family snch that given the labels of two vertices from a graph, and no other information, 
it is possible to determine whether or not the vertices are adjacent in the graph. The labels are 
assnmed to be bit strings, and the goal is to minimize the maximnm label size. A k-hit labeling 
scheme (sometimes denoted k labeling scheme) nses at most k bits per label. In information theory 
adjacency labeling schemes stndies goes back to the 1960’s [22, 23], and efficient labeling schemes 
were introdnced in [57, 69]. Adjacency labeling schemes are also called implicit representation of 
graphs [83, 90]. 

As an example let An denote the family of forests with n nodes. Given a forest F e A^ do 
the following: Root the trees of F and assign each node with an id from [0, n — 1]. Let the label 
of each node be its id appended with the id of its parent. A test for adjacency is then simply to 
test whether the id of one of the nodes eqnals the stored parent id of the other node. The labels 
assigned to the nodes have length 2[logn] bits^. 

Closely related to adjacency labeling schemes are induced-universal graphs also stndied in the 
1960’s [67, 79]. A graph G = (V, E) is said to be an indnced-nniversal graph for a family F of graphs, 
if it contains all graphs in T", as node-indnced snbgraphs. A graph H = {V,E') is contained in G 
as a node-indnced snbgraph liV' and E' = {{v,w)\v,w e V' a {v,w) e E}. We define gv{F) 
to be the smallest nnmber of nodes in any indnced-nniversal graph for E. From [57] (some details 
given in [11, 83]) we have: 

Theorem 1 ([57]). A family, E, of graphs has a k-bit adjacency labeling scheme with unique labels 
t]fgv{E)^2K 

Labels being nniqne means that no two nodes in the same graph from E will be given the same 
label. 

Combining the 2[logn]-bit labeling scheme above with Theorem 1 gives gv{An) = O(n^). Closely 
related, a universal graph for T" is a graph that contains each graph from E as a snbgraph, not 
necessarily indnced. The challenge is to constrnct universal graphs with as few edges as possible. 
Let /e(.T) denote the minimum number of edges in a universal graph for E. In a series of papers [12, 
27, 28, 29, 30, 32, 72] it was established that feiAn) = 0(nlogn). Let G = {V,E) be any universal 
graph for any family Ti of acyclic graphs. In [27] Chung shows gv{T~L) ^ 2|£'| -I- |R| and, combined 
with bounds for fe{An), concludes that gv{An) = O(nlogn). As the bounds for fdAn) are tight 
it is not possible to improve the bounds for gv(An) using the techniques of [27]. However, for the 
family of graphs of forests with bounded degree and n nodes, denoted A^, there exists a universal 
graph with n nodes and 0(n) edges [15, 16], giving gv{An) = 0{n) [27]. 

Chung’s results [27] combined with Theorem 1 give a logn -|- log log n -|- 0(1) adjacency labeling 
scheme for forests, and logn -|- 0(1) for bounded degree forests. In 2002 Alstrup and Rauhe [11] 
gave a logn -I- 0(log* n) adjacency labeling scheme for general forests^. Adjacency labeling schemes 
using logn -I- 0(1) bits are given in [19, 20, 21] for bounded degree forests and caterpillars, in [45] 
for bounded depth trees, and in [44] the case allowing 1-sided errors. Adjacency labeling schemes 
for forests are also considered in [3, 58]. Table 1 summarizes the results. 

While minimizing the label size is the main goal of a labeling scheme, we sometimes also seek 
to reduce the running time. The time used to assign labels to the nodes is called the encoding time, 

^Throughout this paper we use log for logj. 

■^log* is the number of times log should be iterated to get a constant. 
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Graph family 

Upper bound 

Reference 

Forests of bounded degree 

0{n) 

[27] 

Forests 

j^20(iog* n) 

[11] 

Caterpillars 

0{n) 

[19] 

Trees of depth d 

0{nd^) 

[45] 

Forests 

0{n) 

This paper 


Table 1: Size of induced-universal graphs for various families of forests. 


and the time used to decide whether two nodes are adjacent or not is called the decoding time. 
In [19, 20, 21] described above the encoding time is 0(n) and decoding time is 0(1). 

Addressing a problem repeatedly raised the last decades, e.g. in [3, 19, 27, 28, 29, 44, 45, 49, 57] 
we show: 

Theorem 2. There exists an adjaeeney labeling seheme for An using unique labels of length logn-l- 
0(1) bits with 0(1) deeoding time and 0{n) eneoding time in the word-RAM model. 

In our solution the decoder does not know n in advance. The importance of the problem is 
emphasized by it repeatedly and explicitly being raised as a central open problem (see appendix A). 
Theorem 2 establishes that adjacency labeling in forests requires logn -|- 0(1) bits. To see this, 
consider the path of length n as well as the star on n nodes. These two graphs may share at most 
n/2 labels, giving a logl.5n = logn -I-12(1) lower bounds. We note that this lower bound may be 
slightly improved using the result of [73]. 

1.1 Graphs with bounded arboricity 

Let T and Q be two families of graphs and let G be an induced-universal graph for T. Suppose 
that every graph in the family Q can be edge-partitioned into k parts, each of which forms a graph 
in F. In this case, it was shown by Chung [27] that gv{Q) ^ \V{G)\^. She considered the family, 
A^ of graphs with arboricity k and n nodes. A graph has arboricity k if the edges of the graph can 
be partitioned into at most k forests. By combining the above result with gv{An) = O(nlogn) she 
showed that gn{A^) = 0((nlogn)^) improving the bound of from [57]. For constant arboricity 
k, it follows from [11] that 12(n^) = gv{An) ^ Combining Chung’s reduction [27] with 

Theorem 1 and 2 we show that: 

Theorem 3. There exists an induced-universal graph of size 0{n^) for the family of graphs with 
constant arboricity k and n nodes. 

Achieving results for bounded degree graphs by reduction to bounded arboricity graphs is e.g. 
used in [57]. This can be done as graphs with bounded degree d have arboricity bounded by 
+ 1 [25, 64]. 

1.2 Adjacency labeling and induced-universal graphs for other families 

Induced-universal graphs (and hence adjacency labeling schemes) are given for tournaments [14, 
68], hereditary graphs [65, 81], threshold graphs [56], special commutator graphs [78], bipartite 
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graphs [66], bounded degree graphs [85], and other cases [17, 74], Using universal graphs constructed 
by Babai et al. [12], Bhatt et al. [16] and Chung et al. [28, 29, 30, 32], Chung [27] obtains the 
current best bounds for e.g. induced-universal graphs for bounded degree graphs being planar or 
outerplanar. Many other results use reductions from [27], e.g. the induced-universal graphs for 
bounded degree graphs [24, 39]. The result from [39], as many others, is achieved by reduction to 
a universal graph with bounded degree [4, 5]. Other results for universal graphs is e.g. for families 
of graphs such as cycles [18], forests [31, 42], bounded degree forests [15, 47], and graphs with 
bounded path-width [84]. In [9] they give a ([n/2] -I- 4)-bit adjacency labeling scheme for general 
undirected graphs, improving the ([n/2] -|- [logn]) bound of [67], almost matching an (n— l)/2 lower 
bound [57, 67]. An overview of induced-universal graphs and adjacency labeling can be found in [9]. 

1.3 Second order terms for labeling schemes are theoretically significant 

Above it is shown that for adjacency labeling significant work has been done optimizing the second 
order term. This is also true for other labeling scheme operations. E.g. the second order term 
in the ancestor relationship is improved in a sequence of STOC/SODA papers [2, 6, 10, 45, 46] 
(and [1, 59]) to 0 (loglogn), giving labels of size logn -I- 0 (loglogn). Lastly, an algorithm giving 
both a simple and optimal scheme was given in [35]. Somewhat related, succinct data structures 
(see, e.g., [36, 40, 41, 70, 71, 75]) focus on the space used in addition to the information theoretic 
lower bound, which is often a lower order term with respect to the overall space used. 

1.4 Labeling schemes in various settings and applications 

By using labeling schemes, it is possible to avoid costly access to large global tables, computing 
instead locally and distributed. Such properties are used in applications such as XML search 
engines [2], network routing and distributed algorithms [34, 37, 43, 89], dynamic and parallel settings 
[33, 62], and various other applications [61, 76, 80]. 

Various computability requirements are sometimes imposed on labeling schemes [2, 57, 60]. This 
paper assumes the RAM model and mentions the time needed for encoding and decoding in addition 
to the label size. 

Closely related to adjacency is small distances in trees. This is studied by Alstrup et al. in [7] who 
among other things give a logn -|- 0(loglogn) labeling scheme supporting both parent and sibling 
queries. General distance labeling schemes for various families of graphs exist, e.g., for trees [7, 77], 
bounded tree-width, planar and bounded degree graphs [52], some non-positively curved plane [26], 
interval [50] and permutation graphs [13], and general graphs [53, 91]. In [52] it is proved that 
distance labels require 0(log^ n) bits for trees. Approximate distance labeling schemes are also well 
studied; see e.g., [54, 55, 63, 86, 87, 88]. An overview of distance labeling schemes can be found 
in [8], and a more general labeling survey can be found in an overview in [51]. 

2 Preliminaries 

In this section we introduce some well-known results and notation. Throughout this paper we use 
the convention that Igx = max(l,log 2 x) for convenience. We assume the word-RAM model of 
computation. 
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Trees Let Tn denote the family of all rooted trees of size n and let T e 7^. We denote the nodes 
of T by V{T) and the edges by E(T). We let |T| denote the number of nodes in T. For a node 
u e V(T), we let denote the subtree of T rooted in u. A node u is an ancestor of a node v 
iff it is on the unique path from v to the root. In this case we also say that u is a descendant of 
u. A eaterpillar is a tree whose non-leaf nodes induce a path. Throughout the paper we will only 
consider adjacency labeling in trees, as we may add an “imaginary root” to any forest on n nodes 
turning it into a tree of size n -|- 1. To do this we expend at most one extra bit to distinguish this 
from actual nodes. 

Heavy-light For a node u with children children{u) = with \Ty^\ ^ |T^J for all 

i < k, we say that the edge (u, Vk) is heavy, and the remaining edges {u, Vi) are light. We say that 
heavy{u) = is the heavy ehild of u. A node u for which the edge {parent{u), u) is light is called 
an apex node. For convenience we also define the root to be an apex node. For a node u, we define 
children{u) \ {heavy{u)} to be the light ehildren of u. This is called a heavy-light deeomposition [82] 
as it decomposes the tree into paths of heavy edges {heavy paths) connected by light edges. We 
define the light subtree of a node u to be = Ty\Tf,_eavy{u)- For ^ = u. The light 

depth of a node u is the number of light edges on the path from u to the root. The light height of 
a node u is the maximum number of light edges on a path from tt to a leaf in T^. 

Lemma 1. [82] Given a tree T and u e V{T) with light height x, \Tu\ ^ — 1. 

Bit strings A bit string s is a member of the set {0,1}*. We denote the length of a bit string s by 
|s|, the ith bit of s by Si, and the concatenation of two bit strings s, s' by sos' (i.e. s = S 1 OS 20 .. .os|^|). 
We say that si is the most significant bit of s and is the least significant bit. For an integer x we 
let 0® and 1^ denote the strings consisting of exactly x Os and Is respectively. Let a be an integer 
and let s be the bit string representation of a. Define the function wlsb{a, k) to be sios 20 - • •os|^l_^, 
i.e. the bit string of a without the k least significant bits. When k > |s| we define wlsb{a,k) to be 
the empty string. When constructing a labeling scheme we often wish to concatenate several bit 
strings of unknown length. We may do this using the Elias 7 code [38] to encode a length k bit 
string with 2k bits and decode it in 0 ( 1 ) time for k = 0(w)^, using standard bit operations. 

For an integer a we will often use a to denote the bit string representation of a when it is clear 
from the context. We will use [a]^ to denote the Elias 7 encoding of a. 

Labeling schemes An adjaeeney labeling sehenie for trees of size n consists of an eneoder, e, and 
a deeoder, d. Given a tree T e Tn, the encoder computes a mapping ct ■ D(T) ^ {0,1}* assigning 
a label to each node u e V{T). The decoder is a mapping d : {0,1}* x {0,1}* ^ {True, False} such 
that given any tree T e Tn and any pair of nodes u,v e V{T) we have d{eT{u), eT{v)) = True iff 
{u,v) e E{T). Note that the decoder does not know T. The size of a labeling scheme is defined as 
the maximum label size \eT{u)\ over all trees T e Tn and all nodes u e V{T). If for all trees T e Tn 
the mapping ct is injective we say that the labeling scheme assigns unique labels. The labeling 
schemes constructed in this paper all assign unique labels and the decoder does not know n. 

^Here, w is the word size. 
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Approximation Given a non-negative integer a and a real number e > 0, a (1 -|-e)-approximation 
of a is an integer b such that a ^ b < {1 + £)a. We also dehne 6 = 0 to be the unique (1 -I- e)- 
approximation of a = 0 . 

Lemma 2. Given an integer a and a number £ e (0,1], we ean find a (1 -I- £)-approximation and 
represent it using 0(lglga-|-Ig bits. Furthermore, if £ = where 5 is a positive integer that ean 
be stored using 0(1) words, we ean find this approximation in 0(1) time. 


Proof. We will use a single bit to distinguish between the cases a = 0 and a > 0, so assume a > 0. 
Let 6 = and = (5^^. Let k = [log^+^z a]. Then (1 -I- e')^ a > {1 + e')^“^. Hence if we let 
6 = (1 -|- we have a ^ b < a(l -I- fi) ^ a(l -I- e). In order to encode b it suffices to encode 5 and 
k. We can do this using 2 [lg5] + 2 [Ig/c] bits using the Elias 7 coding. Note that: 


Taking log 2 gives: 


k — 1 < log]^_|_j/ a 


log2Q 

log2(l + £') 


log2(fc - 1) < log2 log2 a - log2 log2(l + £') 

= log 2 log 2 a + 0 

= log2 log2 a + 0 

Hence Ig A: = O (ig Ig a -|- Ig ^), and since Ig <5 ^ 1 -|- Ig ^ the proof is hnished. □ 

We will use Approx(a, e) to denote a function returning a (1 -|-e)-approximation of a as described 
above. 

3 A simple scheme for caterpillars 

As a warmup, we describe a simple adjacency labeling scheme of size Ign -I- 0(1) for caterpillars. 
The idea is to use a variant of this scheme recursively when labeling general trees. The scheme we 
present uses ideas similar to that of [19]. 

Let p = (tti,..., M|p|) be a longest path of the caterpillar and root the tree in ui. We assign an 
id and an interval I{ui) = [id{ui), id{ui) + l{ui)) to each node Ui, such that id{v) e I{ui) iff w is a 
non-root apex node (all leaves except are apex nodes) and Ui is the parent of v. The ids of the 
ms are assigned such that given the label of m we can deduce id{ui+i) for i < \p\. We hrst calculate 
the interval sizes I and next assign the ids. Both steps can be done in 0(n) time. 

Interval sizes Let 7 * = [lg|T^J]. For each node uj now dehne the |p|-dimensional vector (5j as 
fij{i) = 7 j — \i — j\. Let ki = maxj^i |p| fij{i). This ensures that (ki — ki+i) e { — 1,0,1} for all 
i e {1,..., IpI — 1}. The process is illustrated in Figure 1. The interval size of node m is now set to 
l{ui) = 


(1 + 1082 1 ) 

(1 + log, i) 
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Figure 1: Example of how the 13jS are used to ensure that neighbouring nodes have {ki — fej+i) e 
{- 1 , 0 , 1 }. 

Id assignment The idea is to assign id{ui) such that the ki least significant bits of id{ui) are all 
0. We first assign the id for ui and its children, then U 2 and its children, etc. The procedure is as 
follows: 

1. Assign id{ui) = x, where x is the smallest integer having 0 as the ki least significant bits 
satisfying x ^ id{ui-i) + l{ui-i). For ui we set id{ui) = 0. 

2. Let vi,...^v\rpi |_^ be the light children of Ui. Assign id{vj) = id{ui) + j. Note that 
"^div^rpi \_i) < id{ui) + l{ui). 


The label For a node Ui e p we assign the label 

i{ui) = type{ui) o [ki]^ o wlsb{id{ui), ki) , 
and for v ^ p, assign the label 

i{v) = type{v) o id{v) . 

Here type{u) is 1 if u ^ p. Otherwise, type{ui) is Oxx, where xx is either 00, 01, 10 or 11 corre¬ 
sponding to the following four cases: (00) Ui = u\p\^ (01) ki = ki+i — 1, (10) ki = /cj+i, and (11) 
ki — ki-\-\ -\- 1 . 

Label size First, we let N denote the maximum id assigned by the encoder. Then the label size 
for a node Ui e p is ^ 3 -|- 2 }lg A:*] -|- }lg N] — ki and for n ^ p, it is ^ 1-1- }lg A^]. We will now bound 
N: 

Lemma 3. Given a caterpillar T with n nodes, the maximum id assigned by our encoder, N, 
satisfies 

N ^ 12n . 
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Proof. First, observe that the number of ids skipped between id{ui-i) + l{ui-i) and id{ui) is at most 
2^* — 1 as any set of 2^® consecutive integers must contain at least one integer with ki Os as least 
significant bits. Thus, the maximum id is bounded by Xil=i — 1 + l{uif) = 2 • ~ 1^*1 

and we can bound this using 



bl bl 

2 2 2ft B 


b=ii=i 


bl bl 

S S 2ft« 

u'=l 



2T'j-|*I 


j=l i^—co 


‘ \p\ \ 

23.2ft 1 


concluding that N ^ 12n — \p\ 


□ 


Decoding Given the labels oi u,v f p we always answer False. 

Now assume that we are given the label of at least one node Ui e p. First we deduce id{ui) using 
\ki\.y and wlsb{id{ui),ki). This also gives us l{ui) = 2^b Now there are two cases: 

1. If the other label is for a node v f p, we simply read id{v) and answer True if id{v) e 
[id{ui), id{ui) + l{ui)). Otherwise we answer False. 

2. If the other label is for Uj e p, assume without loss of generality that id{uj) > id{ui). If 
type{ui) = 001, set x to be the smallest integer with the fcj + 1 least significantly bits set to 0 
satisfying x ^ id{ui) + l{ui). If x = id{uj) answer True, otherwise answer False. 

The other types can be handled similarly. 


4 An optimal scheme for general trees 

In this section we prove Theorem 2. Similar to the caterpillar scheme presented in the previous 
section we assign an id, id{u), and interval, I{u), to each node. The interval and id of a node is 
assigned such that id{v) e I(u) iff r e T^. The label of a node u will be assigned such that we can 
infer the following information (loosely speaking) directly from the label: 

• The id of the node u. 

• The id of tt’s heavy child, heavy{u). 

• The interval I{u) containing the ids of all nodes in tt’s light subtree. 

• Auxilliary information to help decide whether u is a light child of another node. 

In order to store this information as part of the label, each node will be assigned an id with a 
number of trailing zero bits proportional to the logarithm of its interval size corresponding to the 
kiS of Section 3. Furthermore, we ensure that the interval size for a node u is proportional to |T^| 
(or simply \Tu\ for apex nodes), and call this the light weight of u denoted by lw{u). Intuitively this 
ensures that nodes with large subtrees have more “bits to spare”. 

The labels are assigned using a similar two-step procedure as in Section 3. In the first step we 
assign the light weight of each node using a recursive procedure, and in the second step we assign 
the actual ids of the nodes based on the given weights. Both steps are handled in 0{n) time. In 
order to bound the maximum id assigned we introduce the notion of path weights (to be defined 
later). The path weight of a heavy path p is denoted pw{u), where u is the apex node of p. 
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4.1 Weight classes and restricted light depth 

The auxilliary information mentioned above is primarily used to determine adjacency between an 
apex node and its parent. A classic way of doing this is to use the light depth of both nodes 
and check that it differs by exactly one. However, the light depth of a node with a small subtree 
could potentially be big in comparison, and thus we cannot afford to store it. To deal with this we 
introduce the following notion of weight classes and restricted light depth: 

Definition 1. Let T be a rooted tree and u some node in T. Dehne 

j [Ig |r„|J if u is an apex node 

"''“’■{[tellilj otherwise. 

The weight class of u is dehned as wc{u) = [lg 7 (tt)J. 

Definition 2. Let T be a rooted tree and u some node in T. Dehne wtop{u) to be the ancestor of u 
with smallest depth such that every node on the path from u to wtop{u) has weight class ^ wc{u). 
The restricted light depth of u is the number of light edges on the path from u to wtop{u) and is 
denoted by rld{u). 

An illustration of these dehnitions can be seen in Figure 2. 

wc = 5 
rid = 0 



Figure 2: Example of weight classes and restricted light depths in a tree. The dotted and solid lines 
correspond to light and heavy edges respectively. 

When assigning the interval /(rt), we will split it into a sub-interval for each weight class i ^ 
wc{u). 

We will now show some properties related to weight classes and restricted light depth. We will 
use the dehnitions of ^{u) and wtop{u) as described in Dehnitions 1 and 2. 



Lemma 4. Let u be any node, then rld{u) ^ 27 ( 7 ^) + 1. 

Proof. Let v be the apex node on the path from v to wtop{u) with the smallest depth. (If no such 
node exist rld{u) = 0 and the result is trivial.) We note that v must have light height ^ rld{u) — 1, 
so by Lemma 1 |T^| ^ — 1 and therefore 7 ( 7 ;) ^ rld(u) — 1. So 

2^{u) ^ ^ ^ 7 ( 7 ;) ^ rld{u) - 1 


which finishes the proof. □ 

Lemma 5. Let u he an aneestor of v sueh that u is an apex node and wc{u) = wc{v). Let k he the 
number of light edges on the path from u to v. Then rld{v) = rld{u) + k. 

Proof. Any node in u’s subtree must have weight class ^ wc{u) since u is an apex node. Since 
wc{u) = wc{v) every node on the path from v to wtop{u) must have weight class ^ wc{v). Thus 
wtop{v) = wtop{u) and there are rld{u) + k light edges on the path from v to wtop{u), i.e. rld{v) = 
rld{u) + k. □ 

Lemma 6 . Let u he the parent of an apex node v. If wc{v) < wc{u) then rld{v) = 0, and if 
wc(v) = wc{u) then rld{v) = rld{u) + 1 . 

Proof. If wc{v) < wc{u) then v has restricted light depth 0 so assume that wc{u) = wc{v). Let 
w be the apex node on u's heavy path (possibly u itself). Then first assume that wc{w) = wc{u). 
By Lemma 5 rld{u) = rld{w) and rld{v) = rld{w) + 1 and the claim is true. Now assume that 
wc{w) > wc{u). Then rld{u) = 0 and rld{v) = 1 and the claim is true as well. Since wc{w) < wc{u) 
is impossible the proof is finished. □ 

4.2 Weight assignment 

We will now see how to assign path weights and light weights to the nodes. The idea is to consider 
an entire heavy path as a “recursive caterpillar” and use ideas similar to those of Section 3. Consider 
any heavy path p = {ui,U 2 , • •., u^p\) in order where ui is the apex node. For each u e p we do the 
following: 

1. For each light-child r; of rt we recursively calculate pw{v). 

2. For every weight class i ^ wc{u), let hi be the sum of pw{v) for all light children v ol u with 
weight class wc{v) = i. 

3. We use the convention that ao{u) = 0, and for i = 1,..., wc{u) we let aj(u) be a 
approximation of aj_i(u) -I- hi{u). 

4. We then define the light weight of u as lw{u) = 1 -|- a,„c(u)iu)- 

For each i = 1, 2,..., |p| we let k'{ui) = 7(rtj) — [2 lg 7 (iij)] -I- 1. We choose k{ui), ..., k{u\p\) such 
that k{ui) ^ k' {ui) for every i = 1 ,... ,\p\ and k{ui) — k{ui+i) e {— 1 , 0 , 1 } for alH = 1 , ..., |p| — 1 . 
We do this in the same manner as in Section 3 when we constructed the labeling scheme for the 
caterpillar, see Figure 1. 

The path weight of ui is defined as pw{ui) = Xll=i {fw{ui ) -I- — 1 ). By this definition, the 

path weight of a leaf apex node is 1 . 
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Algorithm 1: Assign-Weight 


input : Heavy path p = {ui,... ,ut) represented by ui. 
output: path weight of p. 

1 for i = 1 ^ t do 

2 ao{ui) ^ 0 

3 for j = 1 ^ wc{ui) do 


5 for V e {w e Light-Children(?rj) | wc{w) = j} sorted by subtree size do 

6 I + Assign-Weight(t>) 

7 end 

8 aj{ui) = Approx(aj_i(«j) -h bj,'y{ui)~^) 

9 end 

10 lw{ui) = 1 + a^c{ui){ui) 

11 end 

12 k{ui) = 7 (iti) - [ 21 g 7 (ui)l + 1 

13 for i = 2 ^ t do 

14 I /c(tti) = max(7(itj) - [2 lg7(Mj)] + - 1) 

15 end 

16 for i = t — 1—>ldo 

17 I k{ui) = max{k{ui),k{u i+i) - 1 ) 

18 end 

19 pw{ui) <— 0 

20 for i = 1 ^ t do 

21 I pw{ui) ^ pw{ui) + lw{Ui) + — 1 

22 end 

23 return pw{ui) 





Pseudocode for the function Assign-Weight is available in Algorithm 1. 

The main technical part of this paper is to show that calling Assign-Weight ensures that 
pw{u) = 0{\Tu\) for all apex nodes, u e T. This is used to show that the maximum id assigned 
by our labeling scheme is 0(n) and thus takes Ign + 0(1) bits to store. Intuitively this is the case 
since the quality of the approximation used in a node u improves as the size of u’s subtree increases. 
Specifically, we will use the following lemma, which is proved in Section 5. 

Lemma 7. Let T be a tree rooted in r and let u e T be any apex node with light height x. After 
railing Assign-Weight(r) it holds that: 


pw{u) ^ 3 |T„| • ]^ 

7=1 ' 


Furthermore, for any node v e T it holds that 


Mt-)«3|T„'|n(i+A).(i 


(2 + 1 )- 


2 ’ 


( 2 ) 


where z is the maximum light height of any light ehild of v. 


Corollary 1. Let T be a tree rooted in r and let u e T be any apex node and v e T be any node. 
After calling Assign-Weight(r) it holds that: 

pw{u) ^ \Tu\ , lw{v) ^ It/ 

Proof Let u be an apex node with light height x. Then: 


pw{u) ^ 3 \Tu\ • n 

^3|r„|-exp 


e 00 


^3|r„|-exp 2^2 


vi=l 


= 3e" |r„. 


The proof for lw{v) is similar. □ 

4.3 Id assignment 

We create a procedure Assign-Id(n, s) and use it to assign ids to the nodes in the tree. The 
procedure takes two parameters: u, the node to which we want to assign the id, and s, a lower 
bound on the id to be assigned. The function ensures that id(u) 6 [s, s + 2^^“^ — l] has at least 
k{u) trailing zero bits and also assigns an id to every node in n’s subtree recursively. We assign ids 
to every node in the tree by calling Assign-Id(r, 0), where r is the root of the tree. The procedure 
goes as follows: 
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1. We let id{u) be the unique integer in [s, s + 2^^“^ — l] which has at least k{u) trailing zeros 
in its binary representation. 

2. We let Cl,, C'^c(n) denote the partition of n’s light children such that every child v with 
weight class wc{v) = i is contained in Cj. 

3. Fix i in increasing order. We assign the ids to the nodes in Ci in the following manner. For 
convenience say that Q = {ui,... ,v\Ci\}- We then let ti = id{u) + aj_i(n) + 1. For each 
j = 1,... , \Ci\ we call Assign- Id(nj, tj) and set tj+i = tj + pw{vj). 

4. Lastly, for the heavy child n of tt we call Assign-Id(n, id{u) + lw{u)). 

By the above definition we see that for any node u and any node v e we have id{v) e 
{id{u) -I- a^c(v)-iiu),id{u) -f- a^c(i;)(^)]- We also have that id{u) = id{v) u = v. Finally, for any 
two intervals I{u),I{v) either one is contained in the other or they are disjoint. 

Pseudocode for the procedure Assign-Id can be found in Algorithm 2. 

Algorithm 2: Assign-Id 
input : Node u, First available id s. 

1 id{u) <— unique integer in [s, s + 2 ^^“^ — 1 ] with at least k{u) trailing zeroes in binary 
representation. 

2 for j = I ^ wc{u) do 

3 t <— id{u) + aj-i{u) + 1 

4 for V e {w E Light-Children(tt) | wc{w) = j] sorted by subtree size do 

5 Assign-Id(n, t) 

6 t t + pw{v) 

7 end 

8 end 

9 kss±gn-ld{heavy{u),id{u) + lw{u)) 


4.4 Encoding of labels 

We are now ready to describe the actual labels. Let n be a node. Let apex{u) e {0,1} and 
leaf{u) E {0,1} be 1 if u is an apex node and a leaf respectively. If u is not a leaf, let v be the 
heavy child of u and let next{u) e {—1, 0,1} be such that k(v) = k(u) + next(u). If n is a leaf let 
next{u) = 0. We identify next{u) with the bit string of size two that is (00) if next{u) = 0, (01) if 
next{u) = 1, and (11) if next{u) = —1. We let aux{u) denote the following bit string: 

aux{u) = [A:(n)].y o \wc{u)\^ o \rld{u)\^ o apex{u) o leaf{u) o next{u) 

For each i = 1,2,..., wc{u) let Sj be the bit string corresponding to the ^1 -|- (^(^))3 ^-approximation 
aj(tt) as described in Lemma 2. Let M = max* |sj| be the length of the longest of the bit strings 
and let o Sj. Then ri, ... ,r^(,{u) have length M. The table, tableau), from which we 

can decode any of ai(M),..., ayj(.{u){u) in 0(1) time is defined as: 

table{u) = [M].y o n o ... o r^c{u) 
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The label of u is then dehned as: 


(.{u) = aux{u) o table{u) o wlsb{id{u), k{u)) 

Figure 3 illustrates how the interval I{u) is split into a part for each i ^ wc{u). in tableau) 


wc=5 



table(u) 


Figure 3: Illustration of the tableau) structure, partitioning tt’s assigned interval into a part for each 
smaller weight class. 


Label size Since rld{u) = 0{'^{u)) by Lemma 4 we see that the length of aux{u) is upper bounded 
by: 

\aux{u)\ ^ 2 [lg/c(rt)] + 0 (lg 7 (rt)) = 0 (lg/c(tt)) 

where we use that lg 7 (u) = 0{lgk{u)), which is true since k{u) ^ k'{u) = 7 (rt) — 2 [lg 7 (ii)] + 1 . 

By Corollary 1 lw{u) = 0(|T^|) and hence for every i = 1,... ,wc{u): Iglgai(u) ^ lg 7 ('w) + 
0(1). By Lemma 2 we see that M = 0 (lg 7 (u)) where M is the variable used to define tableau). 
Hence, the length of table{u) is at most 0 ((lg 7 (u))^) = 0((lg A:(u))^). Furthermore, the length of 
wlsb{id{u),k{u)) is at most [Igid(tt)] — k{u) ^ Ign —A:(ii) +0(1). Summarizing, the total label size 
is upper bounded by: 

|£(rt)| ^ Ign — k{u) + 0 ((lg/c(rt))^) ^ Ign + 0 ( 1 ) 


4.5 Decoding 

We will now see how we from two labels £(u),£(v) of nodes u,v e T can deduce whether u is adjacent 
to V. Lemma 8 below contain necessary and sufficient conditions for whether u is a parent of v. 

Lemma 8 . Given two nodes u,v: u is a parent of v if and only if either: 

1.1 V is a heavy ehild (i.e. not an apex node). 

1.2 u is not a leaf. 

1.3 id{v) is the first number greater than id{u) + lw{u) with at least k{u) + next{u) trailing zeroes 
in its binary representation. 

or: 
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2.1 V is an apex node. 

2.2 wc{v) ^ wc{u). 

2.3 id{v) e {id{u) + ay,c{v)-i{u),id{u) + a^c(t;)(«)]- 

2.4 If wc{v) < wc{u) then rld{v) = 0 else (if wc{v) = wc(u)J then rld{y) = rld{u) + 1. 

Proof. First we will prove that if r; is a child of u then either 1.1, 1.2, 1.3 or 2.1, 2.2, 2.3, 2.4 hold. 
If V is the heavy child of u then clearly 1.1 and 1.2 hold. By definition id{v) is the unique number 
in \id{u) + lw{u),id{u) + lw{u) + 2^^^^ — 1] with at least k{v) = k{u) + next{u) trailing zeros in its 
binary representation and therefore 1.3 holds. 

Now assume that v is an apex node, i.e. that 2.1 holds. Then v is contained in tt’s light subtree 
and hence, by definition, 2.2 is true. By the definition of assign-id 2.3 holds. 2.4 follows from 
Lemma 6. 

Now we will prove the converse. First assume that 1.1, 1.2, 1.3 hold. By 1.2, u has a heavy 
child, v'. Since k{v') = k{u) + next{u) we see that by 1.3 id{v') = id{v) and hence v = v' and v is 
a child of u. 

Now assume that 2.1, 2.2, 2.3, 2.4 hold. By 2.2 and 2.3 we know that v is contained in the light 
subtree of u. Assume for the sake of contradiction that v is not a child of u and let v' be the child 
of u on the path from v to u. By 2.3 we know that wc{v) = wc{v'). Since there must by at least one 
light edge on the path from v to v' (recall that both v and v' are apex nodes) Lemma 5 gives that 
rld{v') < rld{v). But then 2.4 cannot be true. Contradiction. Hence the assumption was wrong 
and r; is a child of u. □ 

In order to check if u is the parent of v we use Lemma 8. For v we need to decode: 

apex{v), id{v), wc{v), rld{v) 

And for u we need to decode: 

leaf{u), wc{u), id{u), lw{u), k{u), next{u), a^c(v)iu), rld{u) 

By the construction of the labels we can clearly do this in 0(1) time. 


5 Proof of weight bound 


Below follows the proof of Lemma 7. This is the main technical proof in this paper. 

of Lemma 7. We prove the lemma by induction on x. First we prove the lemma when x = 0. 
Consider a heavy path p = (ui,..., U|p|) in order, where ui is closest to the root and has light 
height X = 0. Then lw{ui) = 1 for all i = 1,..., \p\ and: 

bl bl bl 

pw{ui) = 2 - 1 ) = IpI + 2 - 1 = |r„| + 2 - 1 

2=1 2=1 2=1 

Since k'{ui) = 0 for i = 2,..., |p| we see that k{ui) = max{A:'(«i) -|- 1 — i,0} for any i. Hence: 


bl . . bl 00 

12^0;) — 1 j ^ ^ ^ ^ 2^'(“i)+i ^ 2 |T„ 


2=1 


2=1 


2=1 
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Hence pw{ui) ^ S\Tu\ which proves the lemma for x = 0. 

Assume that the lemma holds for all nodes with light height < x, and consider a heavy path 
p = {ui,..., u\p\) in order, where ui has light height x and is the apex node on p. We wish to prove 
that the lemma holds for ui. For each i = 1,..., |p| let be the maximum light-height of any light 
child of Ui- Let a{ui) be the sum of pw{v) over all light children v of Ui. For any i we note that 
Zj ^ X — 1 and so by the induction hypothesis 


a{ui) ^ 3 



n 

i=i 


1 6 


We can upper bound lw{ui) in terms of a{ui) by noting that we approximate the path weights of 
UiS children at most wc{ui) times: 


lw{ui) = 1 + a^c{ui) ^ 1 + a{ui) • 1 + 




wc{ui) 


Since Ui has a child with light height Zi it must have a child with a subtree consisting of at least 
2 ^i+i _ I nodes by Lemma 1. Therefore ^{ui) ^ zi + 1. Since wc{ui) = [lg 7 (rtj)J we can conclude 
that 

1 \ 2 

1 + 


Combining these observations gives: 


^ 1 




^ 1 




^ 1 


+1)2 




{Z^ + 1)2 


( 3 ) 


By an analysis analogous to the one in Section 3 we see that: 

IpI bl IpI 97(«i) 

ti k ti 

For any i = 2,... ,\p\ we know that 7(tti) ^ Zi + 1 and ^ l^nj- Therefore: 


( 4 ) 


IpI 




bl 

L 


i=2 


By Lemma 1 ^ 2^“*'^ — 1 and therefore 7(rti) ^ x. Hence ^ Combining these 

two observations allows us to conclude that 


bl 

E 


2'r(«i) 


< 


|T„, 


bl 

E 


Ui\ 


rl iliuiW a;2 {zi + 1)2 {zi + 1) 


bl \rpe I 

^ 2 y I I 


( 5 ) 
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When establishing the last inequality we use that \Tu^ \ = Xj1=i ■ Now we see that 


bl 


pw{ui) ^ n (^ 


i=l 

bl 




i=i 


2=1 




t=i 


w I ■ • ^ + 


W I ■ • ^ + 


{z^ + 1)2 
6 


bl , 

2=1 


12 


+ 1)2 


^3|r,jn(i 


J=1 


Here we used (3), (4), and (5) together with the dehnition of the path weight. 


□ 


6 Running time 

In this section we argue that the encoding time of the labeling scheme is 0{n) and the decoding 
time is 0(1), thus hnishing the proof of Theorem 2. 

6.1 Encoding time 

To bound the encoding time we will need to bound the total number of nodes with a given weight 
class k. We will use the following notion of contribution: 

Definition 3. For an apex node u we define contrih{u) = V{Ty) and for a heavy child u we define 
contrib{u) = V(T^). We say that a node v e contrib{u) is contributing to u. 

Note that by this definition, the weight class of a node u is exactly 

t(;c(rt) = [Iglg I contrz6(tt) IJ . 


We will need the following lemma: 

Lemma 9. Given a tree T with |T| = n, the number of nodes u with wc{u) = k is bounded by 



Proof. Consider any node u e T. We will first bound the number of nodes v with wc{v) = k 
such that u e contrib{v). Observe that a node u contributes to exactly all apex nodes, which are 
ancestors of u as well as the heavy child v of maximum depth for each heavy path p, such that v is 
an ancestor of u (note that such v might not exist for a heavy path p). Thus at least half the nodes 
that u contributes to are apex nodes. 

Let wi be the apex node in T of minimum depth such that wi is an ancestor of u and wc{wi) = k. 
Then \contrib{wi)\ < 22*^^. Let Wi be the first apex node on the path from Wi-i to u (excluding 
Wi-i itself). Then for all i such that Wi is well defined we have 


\contrib{wi)\ ^ \contrib{wi-i)\/2 , 
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and thus \contrib{w 2 k)\ < 2^*" implying that wc{w 2 k) < k. Thus u can contribute to at most + l 
nodes with weight class k. 

It follows that the total number of nodes contributing to nodes of weight class k is bounded by 
n • (2^^^ + 1 ). Since each node of weight class k has at least 2 ^ nodes contributing to it, we can 
bound the total number of nodes with weight class k by 


n ■ 


2k+i 1 



□ 

The proof of Lemma 9 is illustrated in Figure 4. The figure illustrates how each node u con¬ 
tributes to all apex nodes on the path from u to the root, and how the number of contributing 
nodes doubles per apex node on this path. 



Figure 4: Illustration of Lemma 9. The grey nodes are the ones that u are contributing to. For 
each grey apex node on the path from u to the root, the number of contributing nodes grows by at 
least a factor of 2 . 

We are now ready to bound the encoding time. First recall that we are using the word-RAM 
model with word size clogn for some sufficiently large constant c such that the entire label £{u) fits 
in one word. We are thus able to create the Elias 7 code of k{u), wc{u), rld{u), and M{u) in 0(1) 
time for each node u using standard word operations. 

We may assume that the children of each node is sorted by subtree size. Otherwise we can 
ensure this using e.g. bucket sort in 0 (n) time. 

Since all components of aux{u) other than k{u) can be calculated using a simple DFS-traversal 
in 0(n) time, we see that the total encoding time is dominated by the running time of Algorithm 1, 
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Algorithm 2, and the time to construct tableau) from the ai{u)s. For Algorithm 2 we first observe 
that line 1 can be done in 0(1) time using the following approach: 

1. Let a be the integer resulting from setting the last k{u) bits of the binary representation of s 
to 0. 

2. If a = s, then return s. 

3. Otherwise return a + 2^0) 

Each of the three steps can be done in 0(1) time using word operations. The rest of Algorithm 2 
is a DFS-traversal, which runs in 0(n) time total. For the construction of tableau), observe that all 
of tableau) fits in a word, so we can cal 
over all nodes of T is thus bounded by: 

2 0{wc{u)) = 

ueT 

< 


< 


Here, the second line follows by Lemma 9. For Algorithm 1 we see that the total time spent in the 
loop of line 3 to line 8 for all nodes u e T is bounded by 

0{\children{u)\ + wc{u)) . 

ueT 

By (6) this is 0(n). The rest of Algorithm 1 spends time proportional to the length of the heavy 
path the function has been called with, which sums to 0(n) over all heavy paths. Note that line 8 
is calculated in 0(1) time using Lemma 2. 

By summing up the three different parts we see that the total encoding time of the labeling 
scheme is 0(n). 

6.2 Decoding time 

Using the conditions of Lemma 8 we will bound the decoding time of the labeling scheme: 

Recall that we are able to decode each of k(u), wc(u), rld{u), apex{u), leaf{u), next{u), and 
M{u) in 0(1) time. Doing this we also locate the beginning of ai(u) in the bit string (label). Let 
this bit position be denoted by x. 

Knowing x, M{u), and wc{v) we can read the wc{v) — 1st and wc{v)th. entries of tableau) in 0(1) 
time, since these are located exactly at bit positions x + M(u) ■ {wc{v) — 2) and x + M (tt) • {wc{v) — 1). 
If wc{v) = 1 we know that ao(u) = 0. Similarly we know that wlsb{id{u), k{u)) begins at bit position 
X + M{u) ■ {wc{u) — 1) and consists of the remaining bits. We can do the same for x, thus decoding 
each relevant component of (.{u) and can be done in 0(1) time. 


iculate each ri{u) in 0(1) time. The total construction time 


( llglgnl 

k ■\{w eT \ wc{w) 

fc =0 

( llglgnl 2^ 
kn 

fc=o 


= k}\ 


22 " 


( 6 ) 


O n- 2 


fc =0 


k-2^ 

22" 


0(n) . 
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The conditions 1.1, 1.2 and 2.1-4 can now be checked in 0(1) by using the corresponding values. 
For condition 1.3 we need to be able to find the smallest integer greater than id{u) + lw{u) with at 
least k{u) + next{u) trailing zeroes. Observe that lw{u) = 1 -|- a^c{u)i^) can be obtained in 0(1) 
time from tableau) in the same manner as awc{v)iu) was. Finding the smallest such integer can now 
be done in 0(1) time be using the same procedure as in the previous section. 

This finishes the proof of Theorem 2. 
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_ APPENDIX _ 

A Adjacency labeling for trees explicitly listed as an open problem 

Let Tn denote the family of trees with n nodes. In the quotes below “universal graph” is “induced 
universal”. 

Chung [27, emphasized on page 452-453] “What is the correct order of magnitude for g't,(T„)? 
[...] It would be of particular interest to sharpen the bounds for gv{Tn) [...]” 

In [45, page 465] “Proving or disproving the existence of a universal graph with a linear number 
of nodes for the class of n-node trees is a central open problem in the design of informative labeling 
schemes.” 

In [49, page 592] “[...] prove an optimal bound for trees (up to an additive constant) which is 
still open.” 

In [19, page 143-144] “leaving open the question of whether trees enjoy a labeling scheme with 
logn -|- 0(1) bit labels [...] In particular, for adjacency queries in trees, the current lower bound is 
logn and the upper bound is logn -|- 0(log* n)” 

In [48, page 42] “Induced-universal graph for n-node trees of 0(n) size?” 

In [57] “The question of matching upper and lower bounds for the sizes of the universal graphs 
for these families still remain open.” In this paper trees and graphs with bounded arboricity are 
two of the main families being considered. 

In [44, page 132] “Proving or disproving the existence of an adjacency labeling scheme for trees 
using labels of size logn-l-O(l) remains a central open problem in the design of informative labeling 
schemes.” 
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